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A DESCRIPTION IS GIVEN OF ITEM SAMPLING, OR 
“PSYCHOMETRIC-STATISTICAL INFERENCE," AN APPROACH TO 
GATHERING AND USING EDUCATIONAL DATA THAT ALLOWS STATISTICAL 
INFERENCES TO BE MADE SIMULTANEOUSLY WITH PSYCHOMETRIC 
INFERENCES. THIS IS A PROCEDURE IN WHICH BOTH PEOPLE AND 
ITEMS ARE SAMPLED AND THE DATA FROM A SAMPLE OF PEOPLE TAKING 
A SAMPLE OF ITEMS IS USED TO DRAW INFERENCES ABOUT THE 
PERFORMANCE OF A POPULATION OF SUBJECTS TAKING A P0PULATIC>N 
OF ITEMS. FIVE GENERAL CASES FOR EDUCATIONAL RESEARCH ARE 
PRESENTED TO INCLUDE MOST POSSIBLE USES OF ITEM SAMPLING— (1) 
STATISTICAL OR PSYCHOMETRIC INFERENCE (SIMPLE ITEM OR PEOPLE 
SAMPLING), (2) SIMPLE STATISTICAL-PSYCHOMETRIC INFERENCE 
(SIMPLE ITEM AND PEOPLE SAMPLING), (3) 

STATISTICAL-PSYCHOMETRIC INFERENCE WITH TWO SUBJECT SAMPLES 
AND ONE ITEM SAMPLE (SIMPLE HYPOTHESIS TESTING), (4) 
STATISTICAL-PSYCHOMETRIC INFERENCE WITH MULTIPLE ITEM 
SAMPLING AND MULTIPLE SUBJECT SAMPLING, AND (5) 
STATISTICAL-PSYCHOMETRIC INFERENCES WITH MULTIPLE ITEM 
SAMPLING AND MULTIPLE SUBJECT SAMPLING IN THE CASE OF TWO 
SUBJECT POPULATIONS (HYPOTHESIS TESTING). THESE FIVE CASES 
BEGIN WITH THE MOST SIMPLE SI'i’ ^TIONS AND BECOME MORE 
COMPLEX, AND THE FIRST THREE USE ITEM SAMPLING ONLY 
SUPERFICIALLY. THE AUTHORS NOTE THAT ITEM SAMPLING PROVIDES A 
TECHNIOUE FOR OBTAINING EVALUATIVE DATA ON CLASS OR GROUP 
PERFORMANCE, NOT INDIVIDUAL PERFORMANCE, AND CAN BE 
ESPECIALLY EFFECTIVE WITH CHANGE C»R GROWTH STUDIES. (JH) 
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THE STUDY OF EVALUATION OF INSTRUCTIONAL 
PROGRAMS IS engaged in research that will yield new ideas 
capable of analyzing and evaluating instruc- 
• Staff meinbers are creating new ways to evaluate con- 
tent of curricula, methods of teaching and the multiple 
effects of both on students. The CENTER is unique because 
of its access to Southern Californians elementary, second- 
ary and higher schools of diverse socio-economic levels 
and cultural backgrounds. Three major aspects of the pro- 
gram are ^ ir t' 



Instructional Variables - Research in this area 
^ 5? concernedwitH~identifying and evaluating 
the effects of instructional variables, and with 
the development of conceptual models, learning 
theory and theory of instruction. The research 
involves the experimental study of the effects of 
differences in^ instruction as they may interact 
wxtu iiidxviuual differences amoiig scudeuLa. 

Contextual Variables - Res^jarch in this area will 
be concerned with measuring and evaluating differ- 
ences in community and school environments and the 
inteiactions of both with instructional programs. 

It will also involve evaluating variations in stu- 
dent and teacher characteristics and administrative 
organization. 

Criterion Measures - Research in this field is con- 
cerned with creating a new conceptualization of eva- 
luation of instruction and in developing new instru- 
ments to evaluate knowledge acquired in school by 
measuring observable changes in cognitive, affective 
and physiological behavior. It will also involve 
evaluating the cost-effectiveness of instructional 
programs . 
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Item Sampling In Educational Research 
T. R. Husek and Ken Slrotnlk 

Most educational researchers collect empirical data as part 
of their endeavors. Usually, the data are the responses of a 
number of subjects to a collection of Items. It Is hoped that the 
data will shed light on the substantive research questions. Some- 
times, the responses of the Individual students to the Individual 
Items are the data of Interest, while at other times, the responses 
of the Individual students to the total pool of Items are desired. 
Occasionally, some Indices of the responses of the total group of 
subjects to the entire test are examined, for example the mean 
and standard deviation of the test scores. 

In none of the above situations does the researcher go 
beyond the data which has been collected; his aim Is merely to 
describe In some manner the responses of the subjects who were 
tested to the Items that were used. The analysis of the data 
ranges from a simple report of the responses of the students to 
the computation of summary statistics on the performance of the 
group. Statisticians refer to the collection of possible analyses 

as descriptive statistics^ that Is, descriptive statistics are 

used when the researcher does not want to make any generalizations 
beyond the data which have actually been collected. Often, however, 
the educational researcher wants to make statements about larger 
groups of data, which have not been collected but which might have 
been collected. For example, a population of subjects Is postulated, 
a sample of subjects is drawn from the population, the sample of 
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subjects is given a test, and the results of the measureirient 
procedure are analyzed not so much with a view to describing 
the sample results but wltn the aim of making statements about 
the Inferred performance of the total population of subjects 
on the test. The performance of the sample of subjects is, in 
itself, of secondary Importance; the primary Interest is in 
making Inferences from the sample results to the population. 

The targets of investigation are not the mean and standard 
deviation of the sample, but the estimated mean and standard 
deviation of the population. 

For every descriptive statistic that might be computed 
for a sample of people, there is a corresponding parameter for 
the population of people from which the sample was drawn, and 
there is a large collection of statistical procedures for 
making Inferences about a population of people based on the 
data collected from a sample of people. There is, moreover, a 
growing tendency to use the term "statistical inference" to 
describe the activity of using data on a sample of people to 
make Inferences about a population of people. 

It is also possible to make Inferences about a population 

of items that is, if a group of subjects is given a collection 

of items, it is possible to treat the items as a sample from a 
population of items and the tested subjects as the only subjects 
of interest. In this case, the Inference is from the performance 
of the subjects on the sample of items to the performance of the 
same subjects on a population of items. "Psychometric inference" 
is the term which is used to describe this activity. It is 
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important to koop statistical inforoncoj (whoro tho inforonco is 
from a sample of people taking a fixed set of items to a population 
of people and the same items), separate from psychometric inference 
where the Interence is from a fixed number of subjects and a 
sample of items to the same subjects and a population of items o 
Recently, attempts have been made to develop systematic 
procedures for the case In which both people and items arc sampled 
and where the Inference Is from a sample of people taking a sample 
of Items to the performance of a population of subjects taking a 
population of items. To put the point more clearly, a random 
sample of subjects respond to a random sample of Items, and 
statistical Inferences are made simultaneously with psychometric 
inferences. Most of the work has been done by Frederic Lord. 

In his publications. Lord uses the term "item sampling" to 
cover situations when items are sampled. Since items are sampled 
in the case we have labeled "psychometric Inference" and also in 
the case where both subjects and items are sampled, we have coined 
a new term 'fetatlstlcai-psychometric Inference" to use in the 
latter situation. This is a paper about "item sampling," but we 
have chosen to divide the general topic into two more specific 
cases. • It should be understood from the outset that our treat- 
ment of "statistical-psychometric inference" is not different 
from Lord’s presentations on "Item sampling," and is often based 
on Lord’s analyses. 

The three types of Inference described above should be 
kept separate from another kind of inference, the non-statlstical 
Inference from a population that was randomly sampled to a larger 
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target population that was net randomly sampled^ In "statis- 
tical inference," in "psychometric Inference," and in "statistical- 
psychometric inference," the Inference is from a random sample of " 
items and/or people to a population of people and/or items o The 
major strength of the inferences is that probability theory can 
be used to describe their accuracy and use of probability requires 
that random sampling be usedc In much educational research, the 
population that is randomly sampled is not really the major target 
population, and a non-statlstlcal inference is made only after 
the statistical analysis is finished. The following sltxiatlon is 
an examples A researcher has access to a school district. He 
obtains a random sample of students from the_school district, and 
obtains data on the random sample of subjects. He uses statistical 
Inference procedures to generalize from the performance of the 
sample to the population of subjects in the school district , But 
he really wants to generalize to all the schools in the area, per- 
haps in the county, perhaps in the state. These generalizations 
are made all the time and they are not necessarily restricted to 

educational research. Yet, they are not statistical in nature 

they are not really based on probability but, rather, on the 
judgment of the researcher that he will not do violence to his 
conclusions if he makes a larger generalization than is 
completely warranted by his data gathering procedures. 

These, then, are the major uses to which the educational 
researcher uses data he collects. He might just desire to describe 
the available data; he might want to make a "statistical Inference" 
he might want to make a "psychometric inference"; he might want 
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to make a "psychometric— statistical inference"; and he might 
want to make an "extra-statistical Inferenceo" The various 
Inferences are pictured in Figure lo 

Hopefully, the foregoing has achieved its purpose: it 

has served as an introduction, placing item sampling in perspec- 
tive o 

If the reader turns to the literature, he will find that 
little has been written about item sampllngo The other kinds of 
Inference have been widely used and are extensively described 
in statistics books and philosophy of science texts o "Statistical- 
psychometric inference," however, is a new field, and Frederic 
Lord has been primarily responsible for its development . Lord’s 
interest has been that of a psychometrician, and he treats item 
sampling both as a basis for test theory and as a novel technique 
for gathering data in some situations o We are not treating item 
sampling as a basis for test theory in this paper; our Interest, 
Instead, is in trying to explain item sampling as a new approach 

to gathering educational data an approach which appears in many 

cases to be far superior to existing techniques o 

In the next section of this paper, six general cases for 
educational research will be presented. We hope that these six 
cases will include most possible uses of item sampling. Not 
every case will, strictly speaking. Include a different use of 
item sampling, but some seemingly extraneous material is included 
for clarity of presentation. At the conclusion of the paper, some 
general statements about item sampling will be made. For the 
time being however, let us make only one assertion: while item 
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sampling is not generallj^ useful In making statements about the 
performance of Individual subjects. It Is useful In making 
Inferences about the performance of a population of subjects 
on a population of Items o 

The Use of Item Sampling In Educational Research 

In the following six sections. It will be assumed that the 

aim of the research is. In part, to estimate the mean and variance 

of the population from sample data. Where several alternative 

procedures are available for doing this, suggestions will be 

made as to how to select the most appropriate alternative o 

Furthermore, It will be assumed that all x . (Item scores) are 

gi 

binary In that they can take on either the value 1 (if correct), 
or the value 0 (If incorrect) o The terms "population of subjects" 
and "population of Items" will occur frequently,, It Is Important 
to note that these populations, although always finite, may, at 
times, be considered to be Infinite o For example,' the researcher 
who randomly samples, say, 30 secondary students from California 
regards his population, for all practical purposes, as being 
infinite o In fact, the multiplicative corrective factors for 
finite populations, employed In the estimation formulas for popu- 
lation para,meters, quickly approach 1 as the population size 
Increases relative to the sample slze« It is for this reason 
that most statistical inference chapters Ignore these correction 
factors o Although In Item sampling the populations are many 
times, fairly small, little accuracy will be lost If they are 
considered infinite o Furthermore, the Item sampling formulas 
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for the various situations (viZc, both item and subject popu- 
lations infinite, both finite, or one finite and the other 
infinite) are rather complicated; for all practical purposes, 
the usual formulas for statistical Inference will serve adequately c 
Further discuss-"’ on on this latter point will followc 

The following cases begin with the most simple situations 
and become more complexc The presentations of the simpler uses 
of item sampling are hopefully clear; the more complex cases 
are built on the earlier material. The reader is requested to 
be patient if the descriptions at first do not seem appropriate 
for his needs for the later cases make much more efficient use 
of item sampling. The reader may also feel that statements about 
optimal sample sizes are not sufficiently specific. They are not 
specific because they are no-t- known, — “ However, as discussed later 
in the section of general advantages, the exact specification of 
the optimal procedure is not always necessary, especially in the 
situations where the alternative to an adequate, but perhaps non- 
optimal, procedure is no research at all, 

CASE Statistical or Psychometric Inference ; 

Simple Item or People Sampling 

If a sample of people is drawn from a population of subjects 
and is given a fixed set of items, simple formulas available in 
most elementary statistics texts are adequate to provide estimates 
of the mean and variance for the population. The mean of the 
sample is an unbiased estimator of the mean of the population and 
the sample variance (multiplied by a correction factor) is an 
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unbiased estimate of the population variance. Written In relative 
score notation (see Appendix 2 for all notation). 



( 1 ) 



y = 



N 

lii 

N 






y 



where 



2 ) 



jx = 

y 



N 

^ V 
1 ^1 

N 



( 3 ) 
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N 2 ^ , 

N -1 ^ ^^1 



— ? p 

-6y 



where 



( 4 ) 



y 



N 

= X (y. 

1 ^ 



y)^ /N. 



If Items (and not subjects) are sampled, the procedures are the 
same. The formulas, however, look a little different. 



( 5 ) 
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z 



p = 



p 

S- g 



n 



A 



P 



where 



( 7 ) 
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n -1 P ^ n^ ^p 



where 



n 



Op fT ' 

It can be shown quite easily that 
(9) 

— — A A 

y = P = ;^y = Pp 



(see appendix 1, case 1, note). Case I Is the simplest case, 
and, as previously Indicated, the Inferential procedures for 
Case I are given In most elementary statistics texts. (Simple 
examples using Case I are outlined In Appendix 1.) 



CASE II: Simple Statistical-Psychometric Inference : 

Simple Item and People Sampling 
In this case, a sample of people Is given a sample of Items, 
and the obtained data Is used to generate estimates of the performance 
of the population of people on the population of Items. The formulas 
for estimating the mean and variance of the population of subjects 
taking the population of Items are either of formulas (1) or (5) 
for the mean and 
( 10 ) 

n (n -1) Sy - (n -n)^y (1 -y) -3^"^ 

for the variance. Notice the difference between the estimates of 
as given by (3) and by (10). Formula (10) Is far more complex 



^ N (N -1) 
n (n-1) N (N -1) 
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for 2 reasons s It assumes (a) that the item and subject popula- 
tions are both finite and (b) that the inference must now not 
only be made to a population of examinees, but to a population 
of items 0 The reader should recall that in the section proceeding 
Case I it was recommended that both item and subject samples be 
treated as being infinitely large » This would modify (10) as 
follows o 



( 11 ) 



A 2 N 

o y " Tn-1) (N-l) 



nsj -[y (1-y) - 



It was also stated that the usual formulas for statistical Inference 
can be used in all item sampling situations » In other words, 
although formula (11) gives the exact estimate of (assuming 

infinite populations), formula (3) will also give this estimate, 
not quite as accurately, but adequately and simply I Computationally 
then, we can treat Case II in the same manner as we treated Case I; 
the statistical Inferences we make, however, are to two populations 
Instead of only one» (The difference between the various formulas 
is illustrated with a computational example in Appendix 1, Case II, 
note o ) 



At this point, item sampling procedures provide the Investlga 
tor with a number of alternatives which are not ordinarily available 
to him; for, if the researcher has a given population of subjects 
and a given population of items, he must make some decision about 
how many items and how many subjects to sample. He might give all 
the subjects all of the Item.s; he might give all the subjects one 
of the items; he might give one .subject all the items; or, he 
might give some of the items to some of the subjects. The question 
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Is which of the possible data gathering procedures Is most 
efflclento To help explicate the issue, an example will be 
presentedo It will also be used to Illustrate later points « 

Let us consider a large high school which has been 
given some funds to establish a remedial reading program for 
some of its students. Six hundred students are eligible for 
the program, and 300 of them are selected at random and placed 
Into the program. The school staff develops a set of 100 items 
based on the instructional objectives that they develop for the 
special program. The school administrators vjould like to make 
some statements about the effectiveness of the program. It is 
decided that because of time and cost considerations, only 900 
observations are feasible. (An observation is defined as the 
response of one student to one item, ) The number 900 is not 
magical. There are usually some restraints on the amount of data 
that can be collected, often based on administrative constraints 
of cost problems. The number 900 was chosen just to make the 
example more specific. 

Thus, for Case II, there is a given population of 300 
subjects, a given population of 100 items and constraints that only 
900 observations are possible. What^s more, in Case II, the 
researcher is restricted to giving one sample of items to one 
sample of subjects. However, there are still a large number of 
alternatives: three items could be given to 300 subjects; 100 

items could be given to a sample of nine subjects; and of course, 
there are many possibilities between these extremes. 
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The following are the criteria for selecting data gather- 
ing procedures. First, It Is necessary to obtain some Information 
about the relative homogeneity of the population of subjects and 
of the population of Items o More specifically. It Is important 
to have some Idea of the variance of relative observed scores of 
the population of subjects (Cfy^ variance of the Item 

difficulties In the population of Items Items are 

homogeneous In this special sense and If the subjects are quite 
variable. It Is advantageous to take a small sample of Items and 
give these Items to a large sample of subjects. If the Items 
are variable and the- subjects are homogeneous. It Is advantageous 
to give a large sample of Items to fewer subjects. It Is difficult 
to specify the exact number of Items and subjects to sample 
(the reader Is referred to Lord (1965) for a more complete treat- 
ment of the topic), but the criteria are very useful as guide- 
lines. It is not necessary to actually know the two variances 
mentioned above, for available guesses would probably be quite 
useful o 

In the example of the high school remedial program. It 
should not be difficult to estimate the variability of the sub- 
jects and the variability of the Item difficulties. If the Items 
are well tied to the objectives of the program. It Is probable 
that the Items will be less variable than the subjects and a few 
Items should be given to a larger group of subjects. The reader 
who Is concerned about the restraints Imposed on the example Is 
requested to be patient because Case II, in Itself, is not an 
efficient use of Item sampling, (A simple example using Case II 
Is outlined In Appendix 1, ) 
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CASE III c Statistical-Psychometric Inference 
With Two Subject Samples and 
One Item Sample; Simple Hypothesis Testing 
In this case, the researcher wishes to perform an experi- 
ment and test the hypothesis that the mean score of one population 

of people Is the same as the mean score of another population 

that Is, the researcher has two populations of subjects and Intends 
to take two random samples of subjects from these populations and 
give to the two subject-samples the same sample of items, and test 
this hypothesis: ~ 

It may be useful to use the high school remedial reading 
example here. The high school has 600 students who are eligible 
for the remedial reading program. Three hundred are randomly 
assigned to the special program, and 300 receive Instruction 
under the "regular" program. It Is Important to note that the 
two groups of 300 subjects can be described In two ways. On the 
one hand, they can be viewed as two random samples from a larger 
population of 600 subjects,, On the other hand, they can be viewed 
as two populations from which smaller samples can be randomly drawn o 
If this distinction Is clear, the researcher’s procedure Is not 
too difficult o Prom the two populations of 300 each, he draws two 
random samples of subjects and gives the two samples of subjects 
one sample of items, perhaps 25 Items, randomly sampled from the 
pool of 100 Items. He uses the formulas suggested for Case II to 
estimate the mean and variance of the two populations of 300 subjects. 
Finally, a simple t-test, where 
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would be appropriate to test the null hypothesis that the special 
remedial reading group Is no different from the "regular" group. 

It should be recognized, however, that certain assumptions are 
being made by such a test. Specifically, we are. In effect, 
pretending that the Items were not a sample Implying that the 
yj^ and y^ would be uncorrelated scores, (l,e., Cov (y ’ , y") = 0), 
In which case the usual t-test for Independent samples could be 
used. In actuality, however, the yj^ and y^ are not Independent, 
being dependent on the particular Item sample, which. In fact, has 
been drawn. The consequences of using the simple t~test are 

Q 

evident If we Inspect (Jl, , the sampling variance of the 

jT Jr' 

difference between the two sample means. As always, 

(14) 

= Sy ’ + dj" ~ ^ ^y’ 

where 

(15) 

^22 

^ 1' T' Oy'Oy" = Cov (r, y"). 

To the extent that the relative scores In the two experimental 

conditions are positively correlated (a condition which will be 

seen to most always prevail) will be less than(<f., + 

2 

making the suggested t-test somewhat conservative. Again, 
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however, the increase in sensitivity by using the exact t-test 
(see formula (l6)) is, for all practical purposes, minimal c 

Evaluation of the sensitivity of the t-test becomes a far 
more important consideration when determining appropriate criteria 
for the number of items relative to the number of subjects to 
sample. Now, to the extent that C5^, is small, the t-test 

will be increasingly sensitive. Let us suppose that the students 
In the special program do not differ substantially from the students 
In the regular program except in mean level of performance. 
Statistically, this assumption Implies (a) that the correlation 
between the item difficulties for the two groups is close to 1 
and (b) that the variances of the item difficulties for the two 
groups are nearly equal. In other words, the covariance of the 
item difficulties between the two gro'ips is nearly equal to the 
variance of the item difficulties across both groups. Under 
these circumstances, it can be shown (see Lord, 1965) that 
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where r^Q - Kuder-Richardson 20 coefficient of reliability 

(i,e,. Internal consistency) over the total 
item pool. 

It should be clear that decreasing n (as long as n>l) and 
Increasing N (while holding n N constant) would serve to decrease 



r'C V T” (and thus Increase, sensitivity) so 

j *“.y 5 

positive. Since tests with negative Internal 
difficult to devise in the achievement realm. 



long as r^Q is 
consistency are 
the criterion of 
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positive internal consistency is not difficult to meet in 
educational evaluation research » In other words, assuming 
nearly equal variances of and high positive correlations 
among item difficulties between the two groups, it will be 
advantageous to give fewer items to more subjects as long as 
the Kuder-Richardson internal consistency of the total item pool 
is positlvoc 

CASE IV^ Statistical-Psychometric Inference With Multiple 
Item Sampling and Multiple Subject Sampling 
The methods described thus far use item sampling only 
superficially, A more efficient procedure is to administer 
different samples of items to different samples of subjects — 
that is, a random sample of subjects is drawn from the population 
of subjects and given a sample of items. Moreover, according to 
the formulas given in Case II, the mean and variance for the 
population of subjects taking the population of items is estimated. 
Another random sample of subjects can then be drawn non-overlapping 
with the first and given a sample of items. Estimates of the 
mean and variance are again made. This is repeated with non— over— 
lapping, equal— sized samples of subjects. The estimates of the 
mean and variance are pooled (add up all the estimates and divide 
by the number of estimates) to provide a single estimate of the 
mean and a single estimate of the variance of the population 
of subjects taking the population of items, (See Appendix 1 for 
a computational example,) 

In thits use of item sampling, a number of non— overlapping 
samples of subjects are given a number of samples of items, A 
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number of questions immediately arise » For instance, how many 
people (or items) should be sampled? How large should the samples 
be? How should they be structured? In answering these questions 
' some guidelines might be postulated= It is not necessary to sample 
the total population of subjects, although in many school settings 
it may be just as easy to do as not» It is Important to sample 
the entire population of items, for the exclusion of merely a few 
items makes an Important difference o It is important, also, to 
have every item responded to as often as every other item and, if 
necessary, data should be deleted to satisfy this suggestlonc 
What ^ s more, it is better to have every item paired with every 
other item at least once, and it is also desirable to have each 
possible pair of items responded to as often as any other pair* 

This last recommendation may not always be feasible, (Lord ( 1965 ) 
refers to Fisher and Yates (1938) tables 17-19 as providing 
assistance in fulfilling the recommendation,) 

If it is possible to follow the guidelines, the procedure 
for fulfilling the guidelines is the optimal procedure for obtaining 
the most information given that number of observations. Of course, 
if resources permit, it is sometimes desirable to obtain more 
observations for particular research purposes. Following the 
guidelines also provides the best way of gathering more data, 

CASE Vo Statistical-Psychometric Inferences With 
Multiple Item Sampling and 
Multiple Subject Sampling in the Case of 
Two Subject Populations: Hypothesis Testing 




Hopefully, the reader has noticed that Case III was little 
more than Case II, for it was extended to only two populations of 
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subjects, while Case IV was an extension of Case II to multiple 
Item-subject samples o Case V Is simply the next stage In this 
series of extensions o There are two populations and It is 
desired to use the procedures of Case IV for each of the 
populations and then to test a hypothesis about the means of 
populations o Stated In the framework of Case III and the 
remedial reading example, the situation Is as follows: the high 

school has a pool of 600 students who are eligible for the special 
remedial reading program., Three hundred are randomly assigned to 
the special program; 300 are given the "regular" programo The 
mean and variance of the special group are estimated by the procedures 
of Case IVc The same is done for the "regular" group o Using 
these data on the two groups a t-test can be performedo A far 
more sensitive procedure*, however, would be to block on the 
multiple Item samples In a two-factor analysis of variance deslgnc 
Specifically, suppose 10 samples of Items were given to 10 samples 
of examinees respectively In each Instructional program. A 10 x 2 
factorial design, then, can be used for the analysis, with the 
following advantages: (a) It will be possible to test not only 

for differences between Instructional treatments, but for differ- 
ences between item samples and the Interaction between Item samples 
and treatments. (b) The variance due to differences among Item 
samples will be partitioned out of the error variance, making for 
a more sensitive test of treatment effects. (See Appendix 1 for 
an Illustrative example of such an analysis.) 




^Recommended by Lord In a personal communication. 
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There is an alternative procedure that can be used In this 
situation, and there are Instances in which it may be more 
deslrableo Consider the "regular" group of 300 subjects o In- 
stead of using item sampling to estimate the mean and variance 
of the 300 subjects on the 100 items, it is suggested that just 
the mean of each item is estlmatedo This results in 100 numbers, 
each number an estimate of the mean of an item for the 300 subjects® 
The same procedure can be followed for the 300 subjects in the 
special reading group® Thus, 100 pairs of numbers are obtained, 
two estimated item means for each of the 100 items® A t-test 
for matched pairs can be performed on these data to examine the 
hypothesis of no difference between the groups® This test is 
quite powerful and also allows the researcher to report and 
examine the item data® Since the item data can be very useful 
for diagnostic purposes in the examination of the training program, 
this approach may be the most valuable one for many evaluation situa- 
tions ® 

, General Advantages of and Cautions About Item Sampling 

The central idea of item sampling is simple® It is not 
necessary to give every item to every subject if one desires to 
estimate the performance of a group of subjects on a group of items® 
Much of this paper has been concerned with technical discussions 
of when item sampling would be an efficient data gathering procedure® 
In these technical presentations it is useful to examine the optimal 
circumstances for various uses of item sampling. These technical 
matters should not be allowed to obscure the fact that for many 
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educational issues, especially In the field of educational 
evaluation, item sampling provides the only viable method for 
collecting adequate data* The practical problems of educational 
research often prohibit the use of students for more than a short 
time but often do not seriously Interfer with the testing of many 
students* In these frequent cases the Issue Is not whether Item 
sampling will provide better parameter estimates than other 
procedures* The point Is that Item sampling can be used and that 
useful data can be collected* Perhaps, as mentioned on page 15 , 
the t— test that can be used Is quite cautious, perhp.ps overly so 5 
but It can be performed* Item sampling not only permits. In many 
situations, more efficient data collection, but allows the educational 
researcher to perform some research which might not otherwise be 
possible * 

However, one should not Ignore two aspects of Item sampling 
work* The first has already been mentioned: It is, that item 

sampling Is set up to assist In the estimation of group statistics. 

The ordinary use of item sampling does not lead to statements 
about the performance of the individual subject* The other 
aspect of item sampling which should not be Ignored is the fact 
that it Is predicted on the assumption that the response of a 
subject to an Item Is Independent of the context In which the 
Item Is presented. This assumption Is made whenever one works 
with tests and test theory; but It Is probably of more Importance 
in item sampling, since the subject receives only a few items. 

Because of this, it Is suggested that the researcher not conduct 
one item tests* What the size of the smallest test should be is 
not known, but tests of three to five items would seem to be as 
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small as is desirable « 

Some Possible Uses of Item Sampling 

To date, the authors have been able to find only one 
published article which actually used item sampling (Plumlee, 

1964) o It is obvious, therefore, that further empirical research 
is required to examine the actual performance of item sampling 
techniques in practice » These studies should be performed In 
situations where It Is possible to collect data In several ways, 
and should compare various ways of estimating population para- 
meters on the basis of partial data. 

But there are several possible uses of Item sampling which 
do not depend on the advantages of Item sampling as opposed to 
other techniques of data collectionc These are uses of Item 
sampling where any other manner of collecting data Is Inappropriate o 
Two of these will be described brleflyo 

Most classroom tests are constructed for the purpose of 
differentiating among students o The best test Is said to be the 
one which has maximum variance since that test will make It 
easiest to assign grades. This criterion tends to eliminate 

those Items which every student either passes or misses. Items 

\ 

at about the 50 per cent difficulty level are said to be better 
than other Items, In short, therefore. Items constructed to 
ascertain whether course objectives have been achieved are often 
eliminated from classroom tests since they are often items which 
most students get correct; the good Instructor achieves his goals 
and those Items directly related to his goals are often passed 
by most students. Item sampling provides a technique for obtaining 
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evaluative data on class performance ^ not individual performance, 
without consuming a lot of testing time and without doing much 
damage to the general goal of differentiating among students « 

A classroom test can be constructed so that most of the items 
can be used for the purpose of spreading out the scores of the 
students for individual evaluation purposes, still leaving enough 
items for other purposes if item sampling is used with these other 
items. If a class has 100 students and the typical test for that 
class has 40 items, little damage would be done tp the test by 
shortening the "regular" part of the test to 35 items and using the 
remaining five items per student for course evaluation purposes. 
Five items per student and 100 students permits 500 observations 
to be made, and, if it is the course rather than the individual 
student which is to be evaluated, this number of observations is 
more than enough to obtain valuable data on course objectives. 
Instructor performance and changes in students. 

Another simple but valuable use of item sampling is with 
change or growth studies. If a course has 100 students and the 
researcher is Interested in obtaining some index of the growth 
of the students with respect to a measure containing 100 items, 
by using item sampling it is possible to obtain data on all 100 
items at several times during the term without any student nec- 
essarily taking any item twice. This does not eliminate all of 
the problems of change studies, but it does alleviate one of the 

more serious difficulties that of comparability of measures 

during the study. 

When the researcher decided to use item sampling, one of 
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the early problems by which he Is confronted Is that of presenting 
the material to the subjects. There are a number of obvious 
possibilities such as giving every subject all of the Items, 
but only asking him to answer a selected few of them. One simple, 
but highly Ingenious technique has been brought to our attention* 
as being available If certain machine scoring methods can be used. 
It Is simply to use mark-sense cards with each Item appearing on 
one card. This method appears to be extremely flexible, and not 
too costly If a sufficient number of subjects Is used. It Is, of 
course possible to print more than one Item on a card. In that 
case, though, the reduced flexibility may outweigh the cost 
advantages , 



*We are Indebted to Jack Thomson of the University of California, 
Los Angeles for this suggestion. 
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Case I 



Appendix 1 

Computational Examples 



a) Statistical inference : N = 10, n = 5* and x « = 1 If item 

gi 

scored ’’correct” or x . = 0 if item scored incorrecto 



Person 



Item 



n 

g 



gi 




3 3 1 



0 



^g 

o5 

04 

o 6 

05 



y. 



6 c 6 .2 1.0 .4 .6 .4 0.0 .6 .6 



y 



N V 
1=1 ^ 
N 



_ 5o0 

10 



- - p. Calculated using (1) 

«y 



A 






N 

K 0 

1=1 (yj^ -y) 

Calculated using (3) 



b) Psychometric Inference ; N ^ 10, n = 5® For purposes of 
example, simply replace N with N and n with n in above matrix, 



P 



n 

X 

g=l 



P 



£ = 



n 



2.5 ^ A 

- » 5 = jUi. Calculated using (5) 
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n 



cSi = 



Ji -p^ 

R -1 






.02 



7 j-^ = . 005 Calculated using (7) 



Note: y = .5 

Case II Statistical - psychometric Inferen ce 

Suppose N = 20, n = 10 and we have reason to believe 
that CS y Is approximately .1 and C3^ Is approximately .01. 
Furthermore, we are restricted co 50 Item-subject observations 
for reasons of time and money. Since Qy is large relative to 
C5" P , It Is advantageous to sample fewer Items, say n = 5, 
and more people, say N = 10. For example purposes, suppose we 
get the same data matrix used In Case I. 



y = 



N 






N 



= = .5 “ Calculated (1) 

«y 



p = y = .5 



2 ^ N -1 ^ 2 ^ _9 

y N Oy 



s = 
P 



n -1 
n 



(.07) = .065 Calculated using (3) 
4 

Op “ ^ (. 005 ) = .004 Calculated using (7) 



2 

6y ^ 



N(N-l) 



n (n-1) N (N-1) 



jn(n-l) s^ - (n-n) £y (1-y) 



(iO^(4)^(20) (9) §(9) - (5)£.5 (.5) -.004||= ,05 

Calculated using (10) 
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2 _ n (n -1) 



N (n-1) (n) (n-1) 



|n (N-1)s 



2 

P 



- (N -N) fp (1-p) -s^ij 

«y 



(Symmetrical analogue of (10)) 




between the computational procedures of Case I and Case II Is two 
hundredths and five thousandths respectively. 

Case III: Statistical-psychometric Inference with two subject 



Suppose we have access to 50 students and we want to compare 
programmed instruction A with programmed instruction B. Suppose 
we also have a fairly homogenlous pool of 20 items related to the 
instructional content but are restricted, in terms of economy of 
time and money, to only 100 item-person observations. Since the 
test has a positive Internal consistency, we can, for example, 
administer 5 items to each of 10 students in each Instructional 
treatment. That is, we conceive of the 50 students as constituting 
two populations of 25 students each, from each of which we randomly 
select 10 students and assign one group to treatment A and the 
other to treatment B, 5 items are selected randomly from the pool 
of 20 and administered to each of the students after their instruc- 
tional treatments. 

We thus have two data matrices like that presented for Case II 
above. For purposes of example, suppose the data matrix for A is 
that of Case II and the statistics we need for treatment B have been 



samples and one item sample: simple hypothesis testing 
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calculated. We then have the following data: 



N’ 


= 10 


N" = 


10 


y’ 


= .5 


yM . 


.9 


2 

"y' 


= .065 


2 

s » = 

y" 


.050 



Following the recommendation made in the paper, we can 
treat the samples as being Independent and use the usual t-testo 
(STi “ *006, QTy’' = .004 Calculated using (13) 
t = 4.00 on l 8 df calculated using ( 12 ) 
jUy’ = Jdy is rejected at p < .001 

Case IV: Statistical-Psychometric Inference With Multiple Item 

Sampling and Multiple Subject Sampling 
Suppose we have a 50 item test and wish to get norms for 
this test on 200 people. We have, however, only the time or 
money to make half of the 10,000 item-person observations possible. 
One alternative is to use either Case la (administer, say, all 
50 items to a sample of 100 people - 5000 observations) or Case Ib 
(administer, say, a sample of 25 items to all 200 people - 5000 
observations). In each of these cases, either subject or item 
information is being lost. 

Suppose, however, that we randomly select 5 9 non-overlapping, 

item samples of size 10 and administer them to 5 9 non-overlapping, 

random, examinee samples of slxe 40 respectively. This would 

amount to 400 observations per sampling or 2000 observations total. 

Each of the 5 sampling procedures would yield a 10 x 40 

2 <A 

Case II - type data matrix for whlcji „ and ja (and any other 

y y 

desired parameter estimate) could be computed as illustrated in 
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Case II. Suppose the following data was obtained for each 
of the five samples ^ 



Sample 


Ciy 


A 


1 


.20 


. 63 


2 


0I3 


.41 


3 


.07 


.55 


4 


.09 


.82 


5 


.19 


.57 



Estimates for the entire item-subject population are 
obtained by taking simple arit]:>metie . averages of the sample 
estimates. Hence, 






^ - °20 + .13 + 0 O 7 •t" *09 t *19 _ 



.68 



= .14 



n - .63 + .41 + ,55 + .82 •+ .51 ^ 2.98 ^ 

*^y “ 5 5 



= . 60 



Case V: Statistical-Psychometric Inferences With Multiple Item 

Sampling and Multiple Subject Sampling in the Case 
of Two Subject Populations: Hypothesis Testing 

Suppose the problem in Case IV is modified as follows: the 

50 item test is an achievement test on the content covered by two 
alternative programmed instructional sequences A and B, We have 
a group of 400 students which we randomly assign to the instruc- 
tional treatments. Our purpose is to test the effectiveness of 
treatment A Relative to treatment B with the same economic 
restriction — that is, only 5000 total item-examinee observations 
are possible. 
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We can simply extend the procedures outlined In Case IV 
to our second sample of students » In other words, both examinee 
samples are randomly divided ,, Into 5 ^ non-overlapping, samples of 
size 40 and given the same 5, non-overlapping, random samples of 10 
Items respectively. The amounts to a 5 x 2 analysis of variance 
design with 40 observations per cell which can be analyzed as 
follows : 

Source df 

Item blocks 4 

Treatments 1 

Blocks X Treat- 
ments 4 



Error 



390 



1 
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Appendix 2 
Glossary of Terms 

Lower case subscript used to Identify the 1;^ person. 

Lower case subscript used to Identify the g th Itenio 
Mean relative score on the population of Items. 

Estimated mean relative score on the population of Items. 
Mean Item difficulty for the population of examinees. 
Estimated mean item difficulty for the population of 
examinees . 

Number of Items In a sample of Items. 

Number of Items In the population of Items. 

Number of people in a sample of people. 

Number of people in the population of people. 

The difficulty of Item g. 

Mean Item difficulty for the given sample of people. 
Population Pearson product-moment correlation coefficient. 
Sample Pearson pro duct -moment correlation coefficient. 
Variance of the sample of Item difficulties. 

Variance of the population of Item difficulties. 

Unbiased estimate of the population variance of item 
difficulties . 

Sampling variance of mean Item difficulty. 

Estimate of the sampling variance of mean Item difficulty. 
Variance of the sample of relative examinee scores. 
Variance of the population of relative examinee scores. 




1 
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CS^y* 



Unbiased estimate of the population variance of 
relative examinee scores. 



23. 




Sampling variance of mean relative score. 



24. 

25. 

26 . 

27. 

28 . 



^ 2 

Estimate of the sampling variance of mean Item difficulty. 

y 

x_. : Score on Item g by examinee 1. 

y^ : Proportion of Items In test answered correctly by 

examinee 1, l.e., relative observed score of examinee 1. 
y; Mean relative score on the sample of Items. 

Cov (y’j y"): Covariance between relative scores for the 

2 experimental groups. 
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